2,889 research outputs found

    Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into Phrase-Based Statistical Machine Translation

    Get PDF
    We describe a hybridisation strategy whose objective is to integrate linguistic resources from shallow-transfer rule-based machine translation (RBMT) into phrase-based statistical machine translation (PBSMT). It basically consists of enriching the phrase table of a PBSMT system with bilingual phrase pairs matching transfer rules and dictionary entries from a shallow-transfer RBMT system. This new strategy takes advantage of how the linguistic resources are used by the RBMT system to segment the source-language sentences to be translated, and overcomes the limitations of existing hybrid approaches that treat the RBMT systems as a black box. Experimental results confirm that our approach delivers translations of higher quality than existing ones, and that it is specially useful when the parallel corpus available for training the SMT system is small or when translating out-of-domain texts that are well covered by the RBMT dictionaries. A combination of this approach with a recently proposed unsupervised shallow-transfer rule inference algorithm results in a significantly greater translation quality than that of a baseline PBSMT; in this case, the only hand-crafted resource used are the dictionaries commonly used in RBMT. Moreover, the translation quality achieved by the hybrid system built with automatically inferred rules is similar to that obtained by those built with hand-crafted rules.Research funded by the Spanish Ministry of Economy and Competitiveness through projects TIN2009-14009-C02-01 and TIN2012-32615, by Generalitat Valenciana through grant ACIF 2010/174, and by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement PIAP-GA-2012-324414 (Abu-MaTran)

    A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora

    Get PDF
    Statistical and rule-based methods are complementary approaches to machine translation (MT) that have different strengths and weaknesses. This complementarity has, over the last few years, resulted in the consolidation of a growing interest in hybrid systems that combine both data-driven and linguistic approaches. In this paper, we address the situation in which the amount of bilingual resources that is available for a particular language pair is not sufficiently large to train a competitive statistical MT system, but the cost and slow development cycles of rule-based MT systems cannot be afforded either. In this context, we formalise a new method that uses scarce parallel corpora to automatically infer a set of shallow-transfer rules to be integrated into a rule-based MT system, thus avoiding the need for human experts to handcraft these rules. Our work is based on the alignment template approach to phrase-based statistical MT, but the definition of the alignment template is extended to encompass different generalisation levels. It is also greatly inspired by the work of Sánchez-Martínez and Forcada (2009) in which alignment templates were also considered for shallow-transfer rule inference. However, our approach overcomes many relevant limitations of that work, principally those related to the inability to find the correct generalisation level for the alignment templates, and to select the subset of alignment templates that ensures an adequate segmentation of the input sentences by the rules eventually obtained. Unlike previous approaches in literature, our formalism does not require linguistic knowledge about the languages involved in the translation. Moreover, it is the first time that conflicts between rules are resolved by choosing the most appropriate ones according to a global minimisation function rather than proceeding in a pairwise greedy fashion. Experiments conducted using five different language pairs with the free/open-source rule-based MT platform Apertium show that translation quality significantly improves when compared to the method proposed by Sánchez-Martínez and Forcada (2009), and is close to that obtained using handcrafted rules. For some language pairs, our approach is even able to outperform them. Moreover, the resulting number of rules is considerably smaller, which eases human revision and maintenance.Research funded by Universitat d’Alacant through project GRE11-20, by the Spanish Ministry of Economy and Competitiveness through projects TIN2009-14009-C02-01 and TIN2012-32615, by Generalitat Valenciana through grant ACIF/2010/174, and by the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement PIAP-GA-2012-324414 (Abu-MaTran)

    Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

    Get PDF
    This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result.Work funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement number 825299, project Global Under-Resourced Media Translation (GoURMET)

    Unravelling the role of triisopropylphosphane telluride in Ag(I) complexes

    Get PDF
    The coordination chemistry of chalcogenide ligands has always attracted significant interest in the field of inorganic chemistry, especially for soft metals such as those of group 11. Despite the scarcity of research on phosphane tellurides, we report on the synthesis and characterisation of five novel silver complexes containing the phosphane telluride ligand, TeP(iPr)3, along with other ancillary ligands such as mono or diphosphanes. Spectroscopic studies were performed to investigate the behaviour of these complexes, including their redox properties, as demonstrated by the 1,1′-diphenylphosphaneferrocene (dppf) silver derivatives. Additionally, these complexes showcase remarkable rapid interchange equilibrium, revealing silver species with distinctive Ag2Te2 cores and a combination of bridging and terminal TeP(iPr)3 ligands. A promising avenue for further investigation and potential applications emerges

    Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

    Get PDF
    In the context of neural machine translation, data augmentation (DA) techniques may be used for generating additional training samples when the available parallel data are scarce. Many DA approaches aim at expanding the support of the empirical data distribution by generating new sentence pairs that contain infrequent words, thus making it closer to the true data distribution of parallel sentences. In this paper, we propose to follow a completely different approach and present a multi-task DA approach in which we generate new sentence pairs with transformations, such as reversing the order of the target sentence, which produce unfluent target sentences. During training, these augmented sentences are used as auxiliary tasks in a multi-task framework with the aim of providing new contexts where the target prefix is not informative enough to predict the next word. This strengthens the encoder and forces the decoder to pay more attention to the source representations of the encoder. Experiments carried out on six low-resource translation tasks show consistent improvements over the baseline and over DA methods aiming at extending the support of the empirical data distribution. The systems trained with our approach rely more on the source tokens, are more robust against domain shift and suffer less hallucinations.Work funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement number 825299, project Global Under-Resourced Media Translation (GoURMET); and by Generalitat Valenciana through project GV/2021/064. The computational resources used for the experiments were funded by the European Regional Development Fund through project IDIFEDER/2020/003

    Exploiting large pre-trained models for low-resource neural machine translation

    Get PDF
    Pre-trained models have revolutionized the natural language processing field by leveraging large-scale language representations for various tasks. Some pre-trained models offer general-purpose representations, while others are specialized in particular tasks, like neural machine translation (NMT). Multilingual NMT-targeted systems are often fine-tuned for specific language pairs, but there is a lack of evidence-based best-practice recommendations to guide this process. Additionally, deploying these large pre-trained models in computationally restricted environments, typically found in developing regions where low-resource languages are spoken, has become challenging. We propose a pipeline to tune the mBART50 pre-trained model to 8 diverse low-resource language pairs, and then distill the resulting system to obtain lightweight and more sustainable NMT models. Our pipeline conveniently exploits back-translation, synthetic corpus filtering, and knowledge distillation to deliver efficient bilingual translation models that are 13 times smaller, while maintaining a close BLEU performance.This paper is part of the R+D+i project PID2021-127999NB-I00 funded by the Spanish Ministry of Science and Innovation (MCIN), the Spanish Research Agency (AEI/10.13039/501100011033) and the European Regional Development Fund A way to make Europe. The computational resources used were funded by the European Regional Development Fund through project IDIFEDER/2020/00

    Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

    Get PDF
    When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them. A number of approaches have been proposed to produce synthetic parallel sentences that are similar to those in the parallel data available. These approaches work under the assumption that non-fluent target-side synthetic training samples can be harmful and may deteriorate translation performance. Even so, in this paper we demonstrate that synthetic training samples with non-fluent target sentences can improve translation performance if they are used in a multilingual machine translation framework as if they were sentences in another language. We conducted experiments on ten low-resource and four high-resource translation tasks and found out that this simple approach consistently improves translation performance as compared to state-of-the-art methods for generating synthetic training samples similar to those found in corpora. Furthermore, this improvement is independent of the size of the original training corpus, the resulting systems are much more robust against domain shift and produce less hallucinations.This paper is part of the R+D+i project PID2021-127999NB-I00 funded by the Spanish Ministry of Science and Innovation (MCIN), the Spanish Research Agency (AEI/10.13039/501100011033) and the European Regional Development Fund A way to make Europe. The computational resources used were funded by the European Regional Development Fund through project ID-IFEDER/2020/003

    Geomatic methods applied to the change study of the la Paúl Rock Glacier, Spanish Pyrenees

    Get PDF
    Producción CientíficaRock glaciers are one of the most important features of the mountain permafrost in the Pyrenees. La Paúl is an active rock glacier located in the north face of the Posets massif in the La Paúl glacier cirque (Spanish Pyrenees). This study presents the preliminary results of the La Paúl rock glacier monitoring works carried out through two geomatic technologies since 2013: Global Navigation Satellite System (GNSS) receivers and Terrestrial Laser Scanning (TLS) devices. Displacements measured on the rock glacier surface have demonstrated both the activity of the rock glacier and the utility of this equipment for the rock glaciers dynamic analysis. The glacier has exhibited the fastest displacements on its west side (over 35 cm yr-1), affected by the Little Ice Age, and frontal area (over 25 cm yr-1). As an indicator of permafrost in marginal environments and its peculiar morphology, La Paúl rock glacier encourages a more prolonged study and to the application of more geomatic techniques for its detailed analysis.Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (project CGL2015-68144-R)Junta de Extremadura - Fondo Europeo de Desarrollo Regional (project GR10071

    Imagen producto-país y crisis en el sector hortícola español: clasificación e impacto en el mercado

    Full text link
    [EN] This research provides a conceptual framework to analyse the concept of ‘crisis’ and its multiple origins in the Spanish horticultural sector, the largest horticultural exporter in Europe. For this purpose, this study provides a typology of crises and a classification according to their nature, reasons, and temporary impact. Consequently, this research reviews and chronologically classifies the harmful campaigns that have originated several of those crises. Additionally, the impact on the perceived product-country image is analysed through an empirical research based on the results of a survey of consumers in several European countries.[ES] Esta investigación proporciona un marco conceptual para analizar el concepto “crisis” y sus orígenes en el sector hortícola español, el mayor exportador hortícola de Europa. Para este propósito, este estudio proporciona una tipología de crisis y una clasificación de acuerdo con su naturaleza, motivos e impacto temporal. En consecuencia, esta investigación revisa y clasifica cronológicamente las campañas inter-nacionales perjudiciales que han originado varias de esas crisis. Además, el impacto en la imagen percibida del producto-país se analiza a través de una investigación empírica basada en los resultados de una encuesta dirigida a consumidores finales en varios países europeos.Serrano-Arcos, MM.; Pérez-Mesa, JC.; Sánchez-Fernández, R. (2018). Product-country image and crises in the Spanish horticultural sector: Classification and impact on the market. Economía Agraria y Recursos Naturales - Agricultural and Resource Economics. 18(1):111-133. doi:10.7201/earn.2018.01.05SWORD11113318

    Solving the Location Area Problem by Using Differential Evolution

    Get PDF
    In mobile networks, one of the hard tasks is to determine the best partitioning in the Location Area problem, but it is also an important strategy to try to reduce all the involved management costs. In this paper we present a new approach to solve the location management problem based on the Location Area partitioning, as a cost optimization problem. We use a Differential Evolution based algorithm to find the best configuration to the Location Areas in a mobile network. We try to find the best values for the Differential Evolution parameters as well as define the scheme that enables us to obtain better results, when compared to classical strategies and to other authors’ results. To obtain the best solution we develop four distinct experiments, each one applied to one Differential Evolution parameter. This is a new approach to this problem that has given us good results
    corecore